Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model.
نویسنده
چکیده
Species distribution models are usually evaluated with cross-validation. In this procedure evaluation statistics are computed from model predictions for sites of presence and absence that were not used to train (fit) the model. Using data for 226 species, from six regions, and two species distribution modeling algorithms (Bioclim and MaxEnt), I show that this procedure is highly sensitive to "spatial sorting bias": the difference between the geographic distance from testing-presence to training-presence sites and the geographic distance from testing-absence (or testing-background) to training-presence sites. I propose the use of pairwise distance sampling to remove this bias, and the use of a null model that only considers the geographic distance to training sites to calibrate cross-validation results for remaining bias. Model evaluation results (AUC) were strongly inflated: the null model performed better than MaxEnt for 45% and better than Bioclim for 67% of the species. Spatial sorting bias and area under the receiver-operator curve (AUC) values increased when using partitioned presence data and random-absence data instead of independently obtained presence-absence testing data from systematic surveys. Pairwise distance sampling removed spatial sorting bias, yielding null models with an AUC close to 0.5, such that AUC was the same as null model calibrated AUC (cAUC). This adjustment strongly decreased AUC values and changed the ranking among species. Cross-validation results for different species are only comparable after removal of spatial sorting bias and/or calibration with an appropriate null model.
منابع مشابه
مقایسه روش های زمین آمار به منظور تعیین بهترین روش درون یابی داده های زیست اقلیمی در مدل سازی پراکنش گونه های جانوری در مرکز ایران
Climatic change can impose physiological constraints on species and can therefore affect species distribution. Bioclimatic predictors, including annual trends, regimes, thresholds and bio-limiting factors are the most important independent variables in species distribution models. Water and temperature are the most limiting factors in arid ecosystem in central Iran. Therefore, mapping of climat...
متن کاملSpatial Regression in the Presence of Misaligned data
In this paper, four approaches are presented to the problem of fitting a linear regression model in the presence of spatially misaligned data. These approaches are plug-in method, simulation, regression calibration and maximum likelihood. In the first two approaches, with modeling the correlation between the explanatory variable, prediction of explanatory variable is determined at sites...
متن کاملEffects of Digital Elevation Models (DEM) Spatial Resolution on Hydrological Simulation
Digital Elevation Model is one of the most important data for watershed modeling whit hydrological models that it has a significant impact on hydrological processes simulation. Several studies by the Soil and Water Assessment Tool (SWAT) as useful Tool have indicated that the simulation results of this model is very sensitive to the quality of topographic data. The aim of this study is evaluati...
متن کاملDevelopment of near infrared reflectance spectroscopy (NIRS) calibration model for estimation of oil content in a worldwide safflower germplasm collection
The development of NIRS calibration model as a rapid, precise, robust, and cost-effective method to estimate oil content in ground seeds of worldwide safflower germplasm collection grown under different agro-climatic conditions was the key objective of this research project. The oil content was measured by accelerated solvent extraction method in a total of 328 samples collected across 2004 (16...
متن کاملکالیبراسیون و ارزیابی مدل هیدرولیکی- هیدرولوژیکی SWMM به منظور شبیه سازی رواناب سطحی (مطالعه موردی: شهر گرگان)
This study was done to simulate runoff of Gorgan city using of the hydrologic-hydraulic model SWMM. In this study, to calibrate the model, four rainfall events, were used and the speed of the corresponding runoffs in the chosen sub basin were recorded. In this study, NS, RMSE and BIAS% were used as model performance indices in the estimating peak discharge and flow volume. Also significant and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Ecology
دوره 93 3 شماره
صفحات -
تاریخ انتشار 2012